Improved sequence variant analysis strategy by automated false positive removal
نویسندگان
چکیده
Sequence variant analysis (SVA) is critical in therapeutic protein development because it ensures the absence of genetic mutations of a production clone or high-level misincorporations during cell culture. While software for searching sequence variants from mass spectrometry data are available, effectively distinguishing true positives from a large number of false positives in the reported hits or identifications found in the error tolerant search mode is a challenge. This verification process must be done manually and can take several days or even weeks to accomplish. We report here the use of a Perl-based script to evaluate every identified hit to remove the false positives from the search results of PepFinder™ (also known as MassAnalyzer) based on orthogonal criteria. Our data show that the false positives from PepFinder™ output were reduced ∼4-fold without loss of accuracy in the detection of true identifications, representing a more than 70% reduction in time compared with the manual data verification process.
منابع مشابه
iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations
PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...
متن کاملWhole-Genome Sequence Accuracy Is Improved by Replication in a Population of Mutagenized Sorghum
The accurate detection of induced mutations is critical for both forward and reverse genetics studies. Experimental chemical mutagenesis induces relatively few single base changes per individual. In a complex eukaryotic genome, false positive detection of mutations can occur at or above this mutagenesis rate. We demonstrate here, using a population of ethyl methanesulfonate (EMS)-treated Sorghu...
متن کاملGLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data
BACKGROUND There are various next-generation sequencing techniques, all of them striving to replace Sanger sequencing as the gold standard. However, false positive calls of single nucleotide variants and especially indels are a widely known problem of basically all sequencing platforms. METHODS We considered three common next-generation sequencers-Roche 454, Ion Torrent PGM and Illumina NextS...
متن کاملDeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computation...
متن کاملTop-Down Analysis of Small Plasma Proteins Using an LTQ-Orbitrap. Potential for Mass Spectrometry-Based Clinical Assays for Transthyretin and Hemoglobin.
Transthyretin (TTR) amyloidosis and hemoglobinopathies are the archetypes of molecular diseases where point mutation characterization is diagnostically critical. We have developed a Top-down analytical platform for variant and/or modified protein sequencing and are examining the feasibility of using this platform for the analysis of hemoglobin/TTR patient samples and evaluating the potential cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 9 شماره
صفحات -
تاریخ انتشار 2017